Metric embedding with outliers

نویسندگان

  • Anastasios Sidiropoulos
  • Yusu Wang
چکیده

We initiate the study of metric embeddings with outliers. Given some metric space (X, ρ) we wish to find a small set of outlier points K ⊂ X and either an isometric or a low-distortion embedding of (X \K, ρ) into some target metric space. This is a natural problem that captures scenarios where a small fraction of points in the input corresponds to noise. For the case of isometric embeddings we derive polynomial-time approximation algorithms for minimizing the number of outliers when the target space is an ultrametric, a tree metric, or some constant-dimensional Euclidean space. The approximation factors are 3, 4 and 2, respectively. For the case of embedding into an ultrametric or tree metric, we further improve the running time to O(n) for an n-point input metric space, which is optimal. We complement these upper bounds by showing that outlier embedding into ultrametrics, trees, and d-dimensional Euclidean space for any d ≥ 2 are all NP-hard, as well as NP-hard to approximate within a factor better than 2 assuming the Unique Game Conjecture. For the case of non-isometries we consider embeddings with small `∞ distortion. We present polynomial-time bi-criteria approximation algorithms. Specifically, given some ε > 0, let kε denote the minimum number of outliers required to obtain an embedding with distortion ε. For the case of embedding into ultrametrics we obtain a polynomial-time algorithm which computes a set of at most 3kε outliers and an embedding of the remaining points into an ultrametric with distortion O(ε log n). Finally, for embedding a metric of unit diameter into constant-dimensional Euclidean space we present a polynomial-time algorithm which computes a set of at most 2kε outliers and an embedding of the remaining points with distortion O( √ ε). ∗Dept. of Computer Science and Engineering and Dept. of Mathematics, The Ohio State University. Columbus, OH, USA. Supported by NSF grants CCF 1423230 and CAREER 1453472. †Dept. of Computer Science and Engineering, The Ohio State University. Columbus, OH, USA. The work is partially supported by NSF under grant CCF-1319406. ar X iv :1 50 8. 03 60 0v 1 [ cs .D S] 1 4 A ug 2 01 5

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phishing website detection using weighted feature line embedding

The aim of phishing is tracing the users' s private information without their permission by designing a new website which mimics the trusted website. The specialists of information technology do not agree on a unique definition for the discriminative features that characterizes the phishing websites. Therefore, the number of reliable training samples in phishing detection problems is limited. M...

متن کامل

Outlier Detection for Robust Multi-dimensional Scaling

Multi-dimensional scaling (MDS) plays a central role in data-exploration, dimensionality reduction and visualization. State-of-the-art MDS algorithms are not robust to outliers, yielding significant errors in the embedding even when only a handful of outliers are present. In this paper, we introduce a technique to detect and filter outliers based on geometric reasoning. We test the validity of ...

متن کامل

An Effective Approach for Robust Metric Learning in the Presence of Label Noise

Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...

متن کامل

Robust Metric Structure from Motion for an Extended Sequence with Outliers and Missing Data

In this paper, we propose a robust metric structure from motion (SfM) algorithm for an extended sequence with outliers and missing data. There are three main contributions in the proposed SfM algorithm. The first is a novel jury-based preemptive LMedS procedure to achieve efficient outlier detection. The second contribution is a new iterative two-step scheme that consists of robust estimation t...

متن کامل

A Novel Approach to Embedding of Metric Spaces

An embedding of one metric space (X, d) into another (Y, ρ) is an injective map f : X → Y . The central genre of problems in the area of metric embedding is finding such maps in which the distances between points do not change “too much”. Metric Embedding plays an important role in a vast range of application areas such as computer vision, computational biology, machine learning, networking, st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1508.03600  شماره 

صفحات  -

تاریخ انتشار 2015